DOCOMEHT'EESOHB 

) : ■ ■ ' 

ED 1116 539 > / • CS 603 620 

V " .''1 * . . ■ _ ' .• . 

40THOB ^ • ' Calfee, Robert; Jael/ Connie * " ' ^ 

TITLE \, How Theory and Research on Reading issfessaejit can 

Serve Decision-M'aikers, • • 
SPONS AGEjHCI ' Carnegie Foundation for the Advancement of Teaching, 

-New York, N.I.' 
POB DATE 77 . ; 

NOTE ' - 35p.; Paper presented at the Hinnesota Perspectives 

on Literacy Conference. '(tfinneafolis, Hinnesota, June 
. ' . 1977); See related docuaent CS00a621 

EDBS' PRICE , i!F-$G.83 HC-$2.06 Plus Postage. 

DESCRIPJCES Decision Baking; Diagnostic Tests;. ♦Educational 

Assessaent; Elementary Secondary Education; ♦Models; 
*>, ♦Jleading Processes; ♦Reading Research;- ♦Reading ' 

•Tests; Testing Problees 
IDENT;ifiebs ♦interactive Beading 'Assessnifrnt System; ♦Minnesota 

Educational Assessment Program 

, ABSIRACI ' . . . , ^ ; ^ . . 

. * ^ After reviewing the inf orfflat>icn that teachers aod 

other decisio,!! makers need to have about student achievement and soma 
recent advances' in the tHeory and practice of reading assessment, the 
author lakes ^ number of rfecommeridations for improving asseajsment 
programs. These include: 'do less massive, bread-band testing,, but 
improve the reliability and inf ormativeness of what testing is done; 
look '^to instruction as the model foi: what to test, and then consider 
the infru€nce of the testing situation, the tester, an^ tbe 
materials; be sure the information will be organized in-^a usefu^ way, 
around theoretical models t^e reading process* Too often, decision 
makers have-^the option of too little infprmation (a single test 
score) -or too much information, (a myriad of behavioral 'objective 
scores). In developing these arguments, the Kinnesota Eddcatianal , 
Assessment Program* and 'tfe ^Interactive Reading Assessm^ent System' are 
analyzed to\illustrate toth problems and alternative approaches; ' 
(iA) • ^ . ' . 



♦ Documents acquired 'by EBIC include mAny informal unpublished * 

♦ materials not availa))le from other sources. EEIC makes every effort ♦ 

♦ to obtain the^best copy available. ^Nev^theless, itemiif of marginal ♦ 

♦ reproducibility -ar€ oft^n encountered and this afjEects the quality ♦ 

♦ of the microfiche and hardcopy reproductions EElC makes available ♦ 

♦ via the EBIC Document Reproduction Service (EDRS). EDRS isrnot ♦ 

♦ , responsible for the quality of tl^e original ddcument. Reproductions ♦ 

-supplied by EDRS are the best that can be made from the original. ♦ 



U S DEPARTMENT OF HEALTH. ^ 
EDUCATION ^WELFARE 
NATIONAL INSTITUTE OF ' 
EDUCATIOH 

THIS DOCUMENT MAS BEEN REPRO- 
OUCfeo EXACTLY AS RECEIVED FROM / 
, THE PERSON OR ORGANIZATION} ORlOlN- / 
'aT«NG IT POINTS OF VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRE-^ 
SENT OFPlC»AL NATIONAL tNSTlTUTE Of 
EDUCATION POSITION OR POLICY 



sO 



o 



HOW THEORY AND RESEARCH ON -^EADIIIG ASSESSMENT 
CAN SERVE DECISION-MAKERS''" 




.V 



"PERMIS^ON TO REPRdOUCE THIS 
MATERIAL HAS BEEN GRANTED BY 

Robert C. Calfee 
'^Connie Juel 



TO THE EDUCATIONAL. RESOURCES 
INFORMATION CENTER (ERIC) AND 
THE ERIC SYSTEM CONTRACTORS" 



0 



ERIC 



1 * ' 

Preparation of this p^per was supported in part by a grant from 

the Carnegie Foundation, We are grateful tc^Vriscilla Drum, Doroth^j 

PiontUdwski, Barbara -Tanner , Kay Thoresen, dnd Barbara Tingey for their 

assistance. 



^ . • • • ^ / 



HOW THEORY AND RESEARCH ON READIN^ ASSESSMENT SAN .SERVE DECISION-MAKERS^ 
Robett Calf ee and qonni^ Juel, Stanford University • ^ 

The controversy over educational testing continued to make headlines - 

in newspapers and bold type on* the covers. of professional journals. The 
. • ' 4 • ' ' . * 

^actual, source of the unfi^iness varies somewhat. fx;om one complainant to ^ 
another,^ e.g.,- cultural bia^, cost and time, need, etc. But frequently 
people express concerns with^the inappropriateness of , present measures of 
school achievement. Because of the perceived impc^tance of reading to .sue- 
cess in other school subjects, reading te^ts are. challenged with. special 
'force. ^Parents" are generally j(not alw&ys) mystified by achievement tests 
but? 'believe that their child's measured performance means something — it ' 
,d6es. TeaAers often express the feeling* that present .'reading achievement 
tests don't measure what .they teach—t^Jiey are generally right. Admlnistra- 
tors^V>Pe J:hat their achievement scores will go up rather than down—about 
half the time their prayars are answered. (Hopes and prayers are the pro- 
per terms, for administrators are hard-pressed to find, clear evidence that 
helps them act to improve reading scores. )i Finally, schoo.l board mem^>ers 
and legislators must feel frustrated that with io many resources being alio 
cated to the improvement of reading, there "is no clear ''trend toward improve 
-ment nationwide that mattes the resources — and thdn they hear experts say 
thajt no one knows whalt th^ tests measure anyway I I" * ' 

• What Are^Ohe Answers to all these Problems? k 

' ' ' if ^ 

First*- Existing reading testa-do swerve •a* useful piftp9se. As unidi- . 

' , . / ' ^ 

tnenslonal indicators, they* predict performance on otheY tests 

* with remarkable'^accuracy. ^ * ^ , " 

Second r Present tests are extraordinarily inefficient—we ^could 

^ " probably cut time and effont to one-:^ourth or even ofie-tenth 



Calfee/Juel i^-ll-ni 

Reading Assessment Serves Decision-Makers 



, ^ and obtain the saijie information.^ Group-administered,'' 

multiple-choi(^ testfe one-quarter as long, adjninistered to 
^ half the number of students, would provide adequate data. for 
the purposes "for which they are suitable, at considerable * 
reduction in cost, time, and effort for everyone. ' " 

Third'- Present, tests are not appropriate f^oi all the uses to which 



they are put—they predict general Wtcess or failure in 

V . .■ " . 

school, they call for action, .but they do not tell what action 
to take. * . • * 

We suspect that a lot of ^nergy presently goes into teaching children " - 
what. they already know (furkin, 1974 )\ Given kmited resources, this i^e(- 
fiency is troubling. Tests coulcf provide evidence to highlight areas of need, 
but 'to do this requires* tests that gejiera^e differential prof iles-that 
reyeal rfelativ^ strengths, and weaknesses'-far student, for* class', for " ' ' 
school, and for district. ' Existing sta.ndardized\^chievement tests do'.not 
provide reliable profiles. Subtests exist, to be .sure', 'but subscores with ' 
quite different labels ^re, highly correlated with oneV^npther,- and inve'sti-. 
gators have shown that the 'Iprofiles" "are^ unreliable." Our own wotk'has' 



. shown that existing methods of estabUshing tJtai test. reSbility are- • ' 
iiyimical ,with the creation of .tests that yielc^'reliable profiU information. 
Criterion-referenced and behavioral-ob>c}:iv6 ^tests have diff^^ent kind^ 
, of problems. Aside from; the f^ct that many look like standardiiid^achieVer* 
/ ment tests^ the teacher, and others 'face a wide array, of unosganizeVa^'ta 
(e.g., school administrators have found it. difficult td -jiake use of th 
findings of National Assessment of Educational Progress, U.g. GAffi 197 

We are convinced that. ^differenti^r prof iles' exist, ai^ that le hive 
the methodology to measure them. Advances in dif fei^^ntiaL diagnpsia . 



'Calfee/Juel 6-22-77 ... • • 

Reading Assessment Serves Decision-Makers .' . • ^ " 3 



for decision ^klng are close at hand, in our opinion. These advances will 

* » 

build ott two recent developments— adequate theoretical models. of the read- 
ing process, and redefinition of the concept of test reliability. 
Theoretical Models of Reading • . / 

So many diverse theories of reading exist, ^nd they have proven to >)e 
of such little use- in solving practical problems,' that you might wonder why 
we turn at this juncture tX) the notion of theory. ' Nothing Is so pract;lcal 
as a good theory. It is said, and in many areas "of applied science this 
epigram has proven true. In principle, an adequate theory of reading ' * 
should point us to appropriate methods of test design and constructi-Qn, and 
sTiould direcf us to proper techniques of analysis and Interpretation of the 
results. I beUeve that we can find such guidance in a theory,' though , ' 
probably not from tlie complicated models that many have .t)ropbsed (Calfee,' 
1'975).' ^> ^ - " ' . ■ . - 

The, independent-process theory which we wlU describe below appears ' 
unduly simple, but it has powerful 'consequencels (Calfee, 1976). And though 
lacking the intricacies of a computer simulation model or the elegance of 
.a mathematical derivation, it does have practical consequences. 

Independent-process theory rests on the assumptions that. the mind • 
carries Qut- certain ^activities through the operation of independent cogni- . 
tive processes-^-by analogy, the mind Operates like a "worlcs-in-a-drawer"- 
^'television, rather thali through a' complexly interwoven and interactive net- 
work of processes (Figure. 1). Psychologists and educators- o^f ten s^y that 
people are complicated--and they probably are in some ways, 'But for cer- 
tain' purposes, including the design of reading assessment systen\'s, a few 
simple categories of mental processes may suffice to describe the mc^t ' 
important features oj performance—that is the essence of the independent- . 
I)rocess assumptibn,. - . . » . . » . 



6alfee^ifel' ^ e-ll-Jl ' ^* . ^ , , • • , 

*]Seading Assessment Serves'^eclsion-^Makets , 4- *' * ■ 

, As will become apparent, we also think that the categories of inde-* 
pendent processes in reading are closely linked to what is taught— in the 
case of deriv.^ skills like-4:eading/ people l*arn what they, are taught, 
and they learn independently what thev are taught independently. Xhus, 
as applied tp^readin^, 'the theory a^stoes the existence of, separable skills, 
like^ decoding, vocabulary, 'and qpmprehension. To assess' these as indepen- 
dent skills, we need clean subtests,,'; which minimize the contribution of ancii 
' lary skills. We a:feo heed to'Uritroduce systematic variation in the content • 
*and context of testing,, and the , critical data include comparisons between 
performance in one set of circumstances and another. * . 

'\ , .... V 

An Example of an Independent-Process Model ' ' k i ' 

Let's see how^dtie prirtciple of process-independence applies to the 
assessment of a student's ability; to, ^read" arid understand single' words. 
The task we have in mind is a commop one at the^primary school 'level. The 
.student is showla a list of words. selected- to .represent a particular "level 
of difficult^." He is asked first to pronounce each wo^4, anci then to, 
demonstrate that Ke understands a common meaning of the word. 

What thought proce&se^ must the student bring to bear on the t/sk in 
order to perform succes^sf uUly? What are possible- patterns of fai/ure, 
and wKat do these patterns mean' for' instr,uctibri?' The informatf/n-processing 
model- for' test design- (Figure 2) incorporates three processes ^attention <> 
decodipRf, and laical ^ntcr^retajion / We will look' at eachybf these iij 
turn. /. ♦ . 

First, j^e corisider-bo^? the student attends ^to the tAsk. This process 
is, a xromplex entity in^its own right'*', including the ov/rall level of activ- 
ity., the extent to which the .student s^elects relevant -cues and reiects ir- 
relevant information/ an4 the degree to which *the /tudent can Concentrate 



4 



Caltee/Juel . * 6-22-77 .* ' " V 

Reading Assessment Serves Decision-Makers 



•/ 



-ERLC 
J. 



the maximum available mental cai)aQity on .the tafiik (P;Lontkow8ki & Calfee, 
in press). For our present purposes we will lump alif/f these info a siligl^ 
"l}ox." We plan to influence this process by variation "of a general character, 
. and we Will measure "it; in generic fashion. We include thi's process, in the 

mode-rbe^cause*it seeifts, likely to^ Influence the operation of the (j^her two 
. processes, and, because specialists in learning dis^bijity have identified „ 
attentionSl dysfunction as an important reason for^vreading failure (Ross^ 
1976). The design- of the assessment system allows us to test this hypothesis 
fbr each individual student. - • ' ^ , • 

> The second pro^ss; decodihg^, handles the translation of print into 
spojcen. language. Undoubtedly, thefe are subprocess^s that hapdlfe specific 
aspects of the translation task; but for our purposes we again consider 
^ these as an aggregate."^ * , 

The third process, lexi<^l interpretation, refers to^ the student's 
ability to demonstrate a common meaning of a word presented in isolation. 
One may argue,- and ri,ghtly so^ that during the sl3^ent reading of connected . 
prosa,-.the student thinks in a. quite different manner than when he is .shown 
a' word in isolation and ^asked what it means. The point is well taken, but 
irrelevant to the present situation. Students -are asked to, do b6th tasks 
as part of 'learning to read, and the hi'gh correlatioJ betH'^n performance 
on the two tasks suggests that they share a number ol elem^pts in common. 

Once the model is specified, the next step in test -design is to 
designate .on^ or more factors— variations, in testiiig conditions— that are 
likfely to strongly influence the operation of each p'rocess. ' An example 
of a relevant factor is shown above each of the processes in< Figure 2, 
For instance, in th^ oase of attention, it seems to lis ^at the qperatibn 
of that process should lead to better Wrall performance wjien the student 




1- 



er|c. 



.Calfee/Jael " 6-2g-77 , ' , 

Reading Asse.ssment Serves 'i)ecision-<Makers ' ' . ' ^ • 

. is individually tested in a quiet rocfi tha'n when he is tested with a- gr^up 
in a noisy -roJm. We also propose that regularity of th; letter-sound . 
_ correspondences of the stimulus Vords should affect. the decoding process, 
' and that; familiarity of the words should 'influence the lexical interpreta- 
tio^ process; The design of the test includes all combinations of the 
factors, and so each student .is tested under all combinations/. Thus, in ' 
^ one set of situations th; student is taken into a quiet 'room and asked" to ' 
pronounce-and to define words from combinations, of letter-sound regularity 
^ ' and familiarity. The'testing is then repeated with different^ words 'from ' ^ 
t4ie same design iiTT^egular. nois'y, crdwd.ed classroom. 

. • Having specified variations that' ir^fluence lach process, w^ next want'*, 
; to find a way- to measure th^rat ion of each proces^. wl* recommend 
. ^ choice of the most direct measures possible. Thus, in addition to^record- 
ing the^orrectness of ,the. Pronunciation ar,d Definition , the'tester also " ' 
. records the student .Concentration on the task as a general measure of 
.attention. * ' - , 

The purpose of thejesi^n variations is j:o .measure, the student's per- 
formance un^der d\fferent conditions, ^s a .way' of discovering: relative • 
strengths and we^sses. Ws principle is akin to the- clinical tester 
^ who, bes«es noting a person's overall intelligence test score, also jpnV 
. sidei^s the difference between the verbal and 'pe^rf ormance subtests. 

Reliability .of Profiles* . ^ . . " . ^ • 

Most tests' are designed to optimize the reliability of the total g^ore 
(Cronbach, 1976),- Th^ procedures wa a^e proposing emphasize differences as " 
much or more than overall summary scores (Calfee & Drum, in press). ^ ' 
: • ;^ln general, reliability refers to- the degree to which a measurement' " 
. ff%- consistently reproducible. We can consider tke consistency in performance- 

8 ^ f 



I . 

Calfee/Juel 6-22-7^ ' - ' ' . 

heading Assessment Serves Decision-Makers •. 



when a person is tested with one fo,nn of a test and .tWretested -with a 
.slightly varied form. Several .things have changed.' The exact form and 
content of the test haye changed. The student has probably qhanged. He ^ 
may have learned something, 4ie may have forgotten something, he may have ' 
a headache .now that he -didn't have earlier.. \ All thes^ sources of vari- ^ 
ability tend to influence the reliability in\test-retest situations 
(Cronbach, Gleser,^ Nand^t, & Rajaratnani, 1972^-; 

• Test developers, tend to emphas±zi within-test reliability.- There 

are several ways of think ingj^bout this form of ' conist^ncy (Cronbach, 1970, 

Ch. 6), For instaijce, suppose you divide the '.test items at random in two 

-^nd_i:iixrelalL^.theLtwo subscores^ Repeat this ipejatioi for all possible 

split-ljalf divisions of the test, then compute the average correlation 

between the half-scores '(Cron^bach,' 1951) . ^This provide^ a measure^df the 

extent to which each item cprTtTibutes Consistently to the total test score; 

One way to obtain "perfect" intratest reliability is to use a test in which" 

the items are so homogeneous that the student either falls or passes all"' 

items. Test .developers, to tjie degr^^^ that the^ strive ^or high' levels of 

intratest reliability, ar.e under pressure to -eliminate test items that yield 

divergent patterns of, performance from one student to the next. The items- 

that- remain seem likely to measute geneVal performance characteristics 

rather than petfopances that reflect specific Instructional outcomes. So 

" . .' --■ ■ ■ 

if you want a perfeqtly reliable test, a^k the same question twenty times. ' 

" . 5> . • . - ■ T 

Either a student knows the answer'^V he doesn^t. This would-be ajjsurd' of 

I ■ ,' ■ 

.course, but in the limi): it is the "ideal" eowarrf which "reliabimy' alms . • 

, Maxitaizing intratest reliability ;Ls important when the test scoreVls ' 
to serve -for a major declsion^but it 'majr be j counterproductive for instruc- 
tional deeision-making. Teachers need to knoW mbre\than the student's 

■ ^ . . . y V 



Calfee/Juel d-ll-ll , ■-"<*' V * '> 

Reading Assessment Serves Decision-Makers ' 8 



ge/leral ability. ' Individualization require^ knowledge of diyers^attems , 

of perfbrmance on specific tasks for diffetent studene§;\ - For the teacher, 

a "reliable" assessment instrument is'more properly defined as one which^ 

' • * . * 

. accurately and consistently indicates the specific patterns of instruction 

'. • > 

that best fit the 'student's needs and capabilities^ ' 

We have discussed .elsewhere detailed techniques for me/suring the* 

.reliability, o-f profiles, and have illustrated the application of th'fese 

techniques to the deslgnSnd^anal-ysis of reading/tests (Calfee aad Drum,' 

in press). ^The technical details are not relev4nt' to our present purposes, 

but several points deserve emphasis. First, differential infotmation about 

strengths arid weaknesses in separable skill areas is" needed for intelligent 

decision-making^ Second, in the'degign of most current raading , tests , 

"the reliability of the test" is established in a way that optimizes item 

consistency -With the total score. However, to obtain '^differential protile 

information requires the development of tests where profile reliabilities 

are optimized. , Third, we suspect that increasing the reliability^ of pat- 

terns will require test developers to minimize generalized task d'^mands 

and place emphasis on specific task demands in the construction of tests/- 

Such steps shoald enhance .the validity of the tests in significant ways. 

Evaluation of^.the t^AP " ' , 

In this section we apply some of the previous ideas in a critical 

t I 

/ » * ' ! 

evaluation of the Minnesota Edi^ational Assessment Program ^MEAP) (Minne- 
sota Department of Education, 1974p^ The s'tate^ purpose bf the , Minnesota 
Assessment was to "examine .the reading performance of Minnesota studentSi 
and determine which' factors appear to account fojr a variation that 
performance. This report, and analysis of tW results? gives" a clearer 
picture of how well Students lare reading and examines how groups, of ^tu- 



10 



Caifee/JutfiHt 6/77 ' \ , 

.ifeadiiTg Assessment/Decision-MakerS ^ ^ * ♦ 9 ""^ 

^ •*/*'• ^ ' ■ • ^ ^ ■■• 

deiits vary in performance.. By describing the levels of reading performance . 
. in Minnesota^ the report presents- to educators, policy makers, and the lay 
public reliable' information to use in the consideration .of alternative ' 
directiona for educational policy" (Minnesota Department of E<iufcation> 1974, 

.P.D- ■ . ■ ■ ^ . '■■ 

The Minnesota A&S€^ssment is a generally fine piece oi work of tHis 
genre. A yariety of tasks and content are represej^te'd in 'this group admin- 
istered, multiple choice test. The 'items are clearly laid out:, and the^ 
instructions fairly readable. A detailed analysis of the results^was carried 
out by several inde^iendent groups,. 'At tiiJies the report has an air of "com- 
,mittee writing," but this, l^g inherent' in 4 multiple perspective approach. 
We did not have access to specific - item analyses—if tlfese are .not avail- 
able they would constitute an Important addition to the report. 
• • We will focus our critique '•oji three points : (a) the f elation between 
program goals and the content and^ analysis of the data, (b) the test lay-- 
out and item construction characteristics, and (c) the formal of present- 
ing measures and ♦reporting data. * * ^ . - . 

In a separate flyer (MS.EP-.-Minnesota Statewide Educational' Assessment 

Program); the following -qaestions are raised: ' - ' 

* * ^ * 5. 

How man^ students in 3gc>ur school district or in tlte state. can. reaj^ 

\ A.uentiy enough to be^ consj^dered basicfally literate? How many caq- 

not? How many stiidents read well -enough to deal with materials.' 

♦ ' -» 

demanding critical, judgmental reading skills? -^ow^m^ students 
*read weg.1' enough to be successful in a college setting?' Are their ^ 
ambitions In line. with their abilities? . 
These are good questions, and sufficiently important to deserve vaj.idated 
answers. ' Unfortunately, no attempt. is made anywhere in this large sqale 



' .Calfee/^el 6/77 ' ' • . ' ^ ' 

Reading Assessment/Decisibn-Makfers . ' , 10 

data collection ^d analysis effort to validate the /esults. .We will not 

spend ^ong on this- pot^t—iimply stated, it is crucial to ©btain .other kinds 

' \ , . ^. ^ ^ 

'of information on siicces^ in schoolijig as a validating criteridn for test 

i« ' * * ^ 

•instruments of- this sort. ^HVli fhat^e know from J^his assessment .projeW 

rLs how well the^^STtidents do on tests. ' 

, . ^ • * V 

; 

A second poin^: about the program goals concerns the way that answers 

are provided.' The test items from' the Minnesota Assessment were used' to 

• • . . ., 

generate several i-ndices: Ba&ic Literacy, School .Success, Reading for 

» • ■ . 

Critical Evaluation and Citizenship, and 'Reading for Success in College! 

These indices are 'reported for aggregated data separately for each index. 

That is, one can find average perjFormandle in Basic Lite^-ac'y -as ^ function ^' 

of various demographic and" ethnographic factors. However, nowhere is infor- 

mation provided in a contrastive form, *so as to show the. relative strengths ' 

■^and weaknesses in these, areas for various sulygroups in- the ^potailation. The 

reader can .put sqme -of tl\e information together f^om tKe report to highlight 

these strengths and weaknesses, but the report doesn't do this job. .It is 

not much of a^ecret'^to find out that low-SES minority groups do poorly 

In virtaully ail of these'' areas— what we also need tojcnow is the character 

of their relative- strengths and weaknegfees^ The" report comes close to ^ro- 

vidingsuch infornation in Chapter 4, where "domain" ^verages are given - 

for several categories of faqtors. Two samples o^f data -for rtine-year-old 

students are plotted in Figure 3, and it app«^rs t.hat the sharpest group- 

differences show' up^nJChe^comprehensive tasks. However, averages pan V 

. ■ ^^\- ' ' ■ ' ' ' •• - ■ 

actualiy/'obscure underlying patterns, and what is needed ara actual profile 
° • .i '■ . ' ' ^ ' ' 

Statistics -for students and sch(^ls (Calfee, 1976rCaife« & Drum, in press) 

•incideritaXly, the MinneSot^*^eport is 'skimpy ©n descriptive statistics;' 

like sample dize, measjires of variability and congelations , which .could • " 



-Calfee/Juel 6/77 - , 

Reading Assessment /Decision-Makers ii • 

provide a more complete picture of the results/. Nonetheless, 'the two' 
profiles in Figure 3 suggest an interesting differehce between the effects 
•of variation in SES (a. relatively sharp contrast in Passage, Comprehension, 
conq>ared witK the other differences), and variation in 'Attention (fairly 
constant decrements 'for the Low Attention group in all domains). We think 
that information (ft this sort, sharpened and highlighted, could provide a ^ 
more useful basis for action than separate compilations of test scores. 

The reader may wonder why this is not a problem of "reporting," The 
answer is, it is a conceptual matter, and not simply a question of how to 
present data. , Decision 7makers at various levels need to begin thinking 
more abouj: what students can and cannot do in particular instances, rather 
• than focusing on overall levels of s*kill or weakness. For too many years, 
the students "average" performance,- weighted to fayqr verbal and academic 
s^kills, has served as a basics for making an overall judgment about that 

o 

chili. It is possible to highlight the child's strength (Cohen, 1973); 
it may be vital to deal with specific weaknesses. Thinking in this 'fashion 
is also likely to lead to tests that are designed for pptimaily^ reliable 
distinctions between .significantly .different areas of skill- 'and knawled; 
^ Next, let u^ look at some of the items in the Minnesota Assessment, 

•In Figure '4 is. a set of eight items testing knowledge of;g,j),ref ixes ^nd 

Figure 4 ' , ^ , « * - • 

♦ 3uffixes. We have several questions,. First, why spend soi-much time test- 

ing the iloncepts of "prefix" and "suffix"? Surely, other questions about?" 
morphology are equally or more relevant to the child *s level of vocabulary 
competence. Each item takes some time and energy — what other variations 
in content and task could be substituted to yield additional information 
about the students skill and knowledge? For instance , one might ask the 
student to add affixes that produce changes in meaning, or t;o 'show a Unowl- 



about ,h^re 




ERJC 



13 



.it 

Calfee/Juel 6/77 / ; 
^ Reading Assessment/Decision-Makefs * 3^2 

* ^- 

e-dge- of how asitaddei affix changes meaning. Second, what does an error 
' mean? Jhe report inclv^s some efforfs to analyze error' patterns, to be- 
sure. Nonetheless, the character of the items makes it difficult to knoA 
precisely how to interpret an error. For instance, what if the student 
hasn't learned thes4 two "reading jargon" terms, but knows the underlying' 
^ concept of affixation? He 'is likely to miss all of the items^ leading to - 

* • 

the mistaken conclusion that he understands nothing about the concept. 

Item content and task demands -arg important determinants of the con- 
cept of testing. If the stu'dent believes that, the test requires him tq'. 
look for "prefixes £nd suffixes" without regard to meaning. .then the stu- 
dent does well to check anything that might be a prefix or suffix. For 
instance, mis and^un-^fTboth prefixes sometimes ?^^dx the student might • • 
spot the^-iiTitems E and H, Nothing in the testing^TtFation requires the 
child to check to see whether such a judgment makes semantic sense. A 
quitnt^N^sual scan leads to errors that are promoted by the test de'sign. It - 
is easy to "design" items that promote e^rrors-it takes /considerably more 
planning and tryoiit to find the conditions that proipote^ccess. 

It is hard to overemphasize the influence of testing context . For 
instance, in^the report we read: "fn the 'ignore the text' strategy', a stu- 
dent seemingly reads the question and chooses a distjractor which represen'ts 

« 

common, but of ten" inaccurate, knowledge" (p. 3^). A thoughtful reader of 

\ ' t 

the, comprehension questions in the Minnefeota Assessment might wonder why a- 
student would follow any other strategy. Many of the questions hinge on', 
external knowledge; more often than not, the. student will be correct if he 
answers on the basis of external knowledge. Reading the prose wastes time, 
an^ a\ids little useful information. After enough^ instances of this sort, 
the clever (or lazy) student will conclude that he should look first at the 
questions, and only^when uncertain return to the text. 

14 



Figure ' 5 



Galfee/Juel '6/77- ^ ' • • 

Reading Assessment/Decis ion-Makers JL3 

The exercise from the Minnesota Assessment in Figure 5 presumably is ' 

about here ^es^igned.to tap vocabulary knowledge. However, the key t6 these questions 

is conventionality... For instance, one might Relieve that zebras are ner- 

^vQus alL over, unlike horses. • The. student who is not familiar -with real 

zebras might also think that they are"* relaUvely hairier than a horse. ' 

The student with some experience with zebtas (picture 'boo.ks that stress 

the stripedness of this animal)' will be at an advantage. Item B ds. even 

more dependent on conventionality (and sexism as well). We all know the 

• ' • 1 

mother's role includes sewing torn pants. A less conventional mother might 

It- 
decide to fold them up and put th'^ aside— the problem is Billy's, not hers. 

The thoughtful and creative child m^ht .'select- "I don't know" as the best 

answer. . But Conventionality dictates that "I don't know" is never' a proper 



i 



answer on a test. 



f For each of these items, the critical question is, what is being tested? 



What, does an error mean? What action s/houl^be taken by the declsioi^maker 
teacher, or lay person when confronting a group of 'students (or individuals) 
who make mistakes on these items? ' ' 

.We will not follow this line further. However, for .any test of tHxs 

't 

general- character, we believe it is a good idea to ask continuously: What ' 

doesi the^. child have to -know in order to succeed? What interpretation is to 

be put on a failure? How can the test item be modified to gain a wider range" 

of information about the student's capabilities, and to ensure that the 

skill and knowledge being t^ped 'is measured in as> clean, precise., and 

uncontaminated fashion a^ possible? ' * 

• «. • 

The last point we .want to make about the Minnesota Assessment concerns 
reporting data in a way that make them useful to decisionrmakers. ^he 
Minnesota report; contains -a great deal of information. Sevejral efforts 



Er|c . ^15 



333 



Calfee/Juel 6/77 , 

Reading Assessment /Decision-Makers 



14 



' have been made to simplify the presentation, and to reduce the tremendous 

amount* of quantitative information. Decisi%--makers need descriptive infc?r-V^^- 

/ 2 . ' 
'mation in the simplest possible* form. ' * ' 

Our complaint comes from the intirusi'efi of unnecessary jargon and » 

acronym^. For instance, fn Figure 6 is a portion of a table from the report • 

. intended to show the /relation between background factors and reading per- 

formance. The^ inf o.rraation is interesting and relevant, but- translation into \ 

a comprehensible form is a tlme-cortsul^!ng task for^the expert, and probably 

V ' . 

outside the competence of many of the *^educat6rs, policy makers, and lay 

public" for whom the report is intended. Our point is simple — researchers 

who prepare reports should keep thfe audience in mind. - , 

* 

Perhaps some* of our points may seem niggling. Howev er, we are firm in 

/ 

the opinion that researchers and evaluators do have important information 
to convey to policy makers and the general public. Many are skeptical about * 
the value of educational re^:^earch *and evaluation. This skepticism partly 



reflects the complexity .of the phenomenon. ' We feel that it also reflects 



•the failure of those who design, administer, analyze, and interpret testi 
results to do^ their best tD*provide useful injEprmation in a clear manner. 
Forj better or^wQTse, those of *us who^Re^^rm this task must be right in 



yl 

everything we do for our work to be pf value. 



16 



ERIC 



r » * 

> Calfee/Juel * 6/ '111 • , 

Reading Assessinent/Decision-Makers . * f . ♦ / ^ 15 . 



Wiat a Teacher Needs from a' Test ^ 

In the preceding. sections jwe have looked, "ar characteristics of an 

** * , * 

as^sment system that facilitate ^ecision-making, illustrating the points, 
by a situation where decif ion-making is at, a fairly high level • Teachers, 
also make decisions, and',^ in our opinion, the s'ame principles apply at the 
level of the class-room as at the higher level"^ of state -administrators- and 
legislators. Partly because the individual classroom situation is more 
concrete and ^comprehensible^ i| may be* easier to see the principles^in 
action at that level. 

What kinci of infonnationj.does the teacher need from a test if the goal 
is to improve instruction?' First, ^e? information is more useful if it 
points directly to the appropriate instructional treatment. Finding out 
that the student has not mastered the basic "long-short" vowel correspon- 
dences in English gives "some direction to the teacher. Being told thar^the^' 
student "lacks adequate word attack skills" is less useful, And information 
that the student "cannot grasp'the abstract character of letter-sound 
corre^spondences" mav^ven be counterproductive— the teach^ may try to teach 
"the abstract charlfcter . . •" * ^ 

Second, test information should reveal the student unique pattern of 
strengths and weaknesses, and not just his over,all level of competence. The 
typical reading achievement test may inform the teacher thaf the student 
"reads at the 25th percentile," .i.e. , that seventy-five" out of every hundred 
st^ents in the nation do better on. the test than tlfis particular student, 
bj: it may show t^at the student performs .two grade-level-equivalents below 

e^ectation for his age* Such messages ^rarely surprise th'f'^ompetent teacher 

'\ r * ' / • i 

,If th6 student is, in general, doing poorly (or well, or^verage) , ' the 

*^ . - . r " • 

teacher does not need a standardized t^st to .tell him so. Learning that an ^ 



Calfee/Juel ^1 111 ^ ^ ' ' - 

Reading AssesSment/Decision-Makers ^ , * ' " ^ ^ * 16 



• old house you just bought is decrepit and in. ne6d of repaijr is np-surer^^se-- " 
> • '■ * • ' , ' • " 

• . hopefully you jcnew that when you bought it. It' Is more useful to'be'tdld . 

i ■ , ^ 

• that the plumbing isn't as bad as it looks, whereas the apparently soUd ' 

, V^loor joists, are riddled by termites and need immediate attention. Similarly, 
the teacher is helped by an assessment system that high^ts patterns of ' 
relati4 strengths ahd weaknesses— such 'as a student's, understaniing' of J:he 
meaning of certain w6rds Is relatively less well developj^jthan his ability 
to decode- them. Such patterns are often undetectable in ^^llimance on, a 
generalized test, especially -if^jzhe student perfoi:ms poorl| ovlrail^anti the ' 
test is not appropriate, to his level of competence (Calfee, Dtum, & Arnold, 
in press) . ■ ^ i . ' \ • 

Third, the teacher needs to be able to discove'r. the condition^ under 
which a student succeeds or fails on fairly jpecific tasks. A low score on 
^ J a standardized tesf of reading achievement means the student has,ndt gi^ren 
correct answers to many of the. questions on a group-adiSinistered , multiple- 

' ' ^ ■ ' ■ ' 

choice test. To do well -on such a test requires numerous skills; if the • . 
student fails, the test '^does not show which skills' were lacking. For example, 
.the usual 'comprehension task requires ,the student to have "gotten ^t all 
together"-i^. demands proficiency in word-attack skills, vocabulary knowledge, 
syntaxl arid ability to group the structural relations in the passage. Two ' * 
s^den^ '^''^ ^^^^^^'^ comprehenders," but for different reksons. 

^ Th^ labeVdMnof reveal the dif fer'ences , and\he teacher is lef t, wifliout 
.-• infor|[iation jie^ded to improve the situation^ *, ' • . " 

The teaqHer can most easily determine '^bhe student 's levll^f knowl^e 
by asking him to perform the same basic task under a variety of conditions. ' ' 
ijiWan'ce, perhaps the student who fatled 'on the group administered 
msi'dh te 



Mr/ 



com- 



P^ehensi'oV teat will succeed when the test is individually administered, 















i Calfee/Juel 6/ 777 ' 
Reading Asseesment/Decision-Make 


irs . - 17 




with care that the student under 


1 ■ 

^tands the directions, that he reads the 





every question. Or perhaps success comes only when the* student is asked to 

rea* the passage aloud, and is given help on words he has trouble piionounc- 

ing. What if the student comprehends only when he is helped to understand 

words that, because of his level of language development, his ethnic back- 
\^ . 

ground, *or his particular interests and experiences, are unfam;Lliar to l^im? 

^"-^ • * 

The student Who comprehends when .special care is taken to mo'tivate him for* 

\ ■ - . 

the test doesn't need more. instruction in the subject matter. His poor 
performance under regular conditions reflects something other than- poor read- 
mg skill (Goodnow, 1972.). Similarly, the student who can demonstrate 'under- ,# 
standing of a passage when he is helped to decode and define difficult words 
.does not need further instruction on comprehension; he does need more train- 
ing in decoding atid yocabulary. 

A fourth requirel|ient for a *test^ if ,it is to be useful to the teacher, 

* I * ' k 

is that iiiformation is 'cheap and efficient. Administration, scoring, and 

interpretation must be ^quick and .^asy. Otherwise, the teacher is unlikely' 

^ \' * . 

If . . - ^ 

to use the test, iven though it gives helpful and relevant information.. _ 
The problem here Is , what Cronbach; (1970; pp. 602ff) calls the bandwidth- 
fidelity dilemma* The need is for a test that covers a ^broad ^ange of skills 
and Icnowledge, the provides variation in \he task requirements, that has a 

"bottom" low enough and a "top"\high enough for the variety of students and 

- \ , • 

«r* the extent- of learning over thei|school year. Meeting al} these criteria 
is not simple; however,, we believe it is ' possible to design reading tests 

i \ " ' ' 

'thafe^come close to^nieeting these requirement. , 



Vv 



. 19 



Calfee/Juel 6/ /77 ' ' . - 

Reading Sssessment/Decision-Makers * 13 



A -Practical ^Example- ^tbe Interactive Reading Assessment System 

•Our effort to apply these principles heurist'ically to itapfove reading . 
testldesign, 'relying on our intuitions about underlying mental processes, 
is exemplified in IRAS (Interactive Reading Assessment System, Calfee & • 
•Calfee, 1977). Concepts of tesfe design will be illustrated by the section 
of' IRAS -that measures comprehension skills. . - 

^ We begin 'by laying out some of the mkjor dimensions which influence 
N performance on a comprel^nsion task. To be sute, comprehension is a pomplex 
•activity, involving numerous -processes. iSebate will continue for some time 
about what' the term really means, h6w exactly to measure it, and what tasks, 
to emphasize under this rubric. For our purp6ses, the questions have been 
resolved%i a practical manner. The basic coifiprehension task entails asking 
a student "to rea<J a passage aloud, and then, to respond to question! designed 
tp tap his a'bility to extract specific details of information contained in 
the^passage, to grasp relations among the facts, and to provide a Reasonable 
summary of the main themes.- We prefer to have the student read aloud, "not • 
because this is essential to coVtehension, but because it provides direct , 
-evidence on W student's level of success in translati^ng the printed text. 

Th^ first and most 4mpdftant dimensional^ the "difficulty" of the pas- 
sag^. i?f one collects a large sample of materials, appropriate t6 the inter- 
est^jand competence of elementary school children, these can be reliably 
graded by experts according to the relative ease with which students can 
read the passages. The features that^nter into' this dimension are partly - 
Vnown at present— among these are pkssage length,^ familiarity of the vocabu- 
lary/ (frequency of occurrence of voids in print), syntactic complexity, 
number of propositions, and degree to which the passage- deals with tS^ics 
that arise in everyday experience, among others. Various readability formu-* 

20 : ' 



ERJC 



Calfee/Juel 6/ /77 

Reading Assessment/Decrsion-Makers 



19 



las verify the existence of ^this dimension, and the degree to which one may 
reliab.ly place a particular passage somewhere on the scale ^Gilliland, 1972; 
Rlare, 1974). These details are ' unimportant to our purposes, which are s&tis- 
fied by the selection of .a wide variety of passages that vary in Vdiff iculty , * 
whatever it means . ' ^ 

A second important dimension for our purposes is the difference cotti-* 

monly referrecj^^to* as "reading" versus "^.istening" comprehensions On^the-Qne' 

hand, the tester can ask the student to read the' passage himself -and then ' ' 

test his understanding, or the tester can read the passage f<^v the student, ■ 

. ^ encouraging him to scan the mater^Sfl as It is read, and then' can test t\\e 

student's understanding,. If the'^.tudent f^s when he read4lfor him^ilf,^ ' 

, he may still do well with- similar materials when the tester reads for hlJa. 

This contrast in perfopn^nce has important implications for "instructioifi • 

^especially when .compared with a- third outcome where the student tloes poorly 

i, -■ . 

even when'^he material is read *o him. > * 

■ " , 1. • . - ". • 

• A fhird dimension, occasionally mentioiied iii test manuals but seldom 

. . * . • T,- ^ 

part of either test validafrion-or interpretatioi^, is th^ chaiJicter of the 
questibn»asked. As noted earlier, one rnayj ask the student to .recall 
■ details; of a specif^-c proposition, to. put together "relations between ^ropo- 

* • - * f * ^ \ * ' * 

sitions, or 'to sGinmaxize the 'structure of the passage— various ^ther ^ssi- 
.bilities exist, but ^these are the main kinds of questions ref erred\ to in 



most discussions of how to measure col^rehension (Guzak, 1972). SuVeiy the 

tester/teacher would want to distinguish between^ th/studenf whose ibjlity i 

' « • ' * • \ 

to handle a compreKenslon task was weak for all categories of qu'estibns. 



and the student who regularly "got the facts" but couldnV organize them." 
\^ Another dimension closely related to the type of qu^tion Is the \ ' 
response r«qulred~productlve versus receptive. If the' student* Is askld, 



21 



\ 



Ca;fee/Juel . 6/ 111 ^. #, 



20 



Reading Assessment/Decision-Makers- < ^ 

after reading a murder myst^ery, /'Who* comioiteed 'tji4 c>ime?d' must -g^er'ate 

the answer on' his own, reaching into memoi;y "tor pos§ibll^alternatj.ves , then 

• '« - . • ' - . 

choosing the one that seems most plausible. I? g^'student is.^sked, "Was 

it the butler ^or the grandson?*^^ searching in^oBy^ for:.KVlabl# -;aiter natives 

is'^uMecessary. Only recognition is required, an'd tl?e student may actually 

% ' ' '^^ 

dse his knowledge of the world. to make the cho±jpe. without reading the tnate- 

rial at all— how often rfoe^s a writer have someoire murder .his grandfaYti|r ; 
*it must have been the butler. * 'tS^'^ « s 

' 'Figure ] shows how these dimensions a^e -rlp^^nted l?n., sample mate- 



1 !• 



;nals from, IRAS. For efficiency, the student is asked to help locate his 
level^of corapeUencie. He looks at a graded series^of passa^ges (those^n , 
^he^lgiire are from the fifth and ninth levels ?\ ser ies from i to 14),^ 
and tells the tester when he ha^^eached a passage ^ tjiat he tT^s he cannot 
read. The tester then asks the student to read "the preceding passage aloiid, 
and to answer several qxiestions. If the student^s reading perfiormance is 
poor, or if he fails to answer the .questions safi^actorily , the tester 
then asks him to read the 'next easier passage iV 4fe series. Jhis procedure . 
.is conlfinued until the student achieves a satisf ac'^ory level of performance. 

If the student is successful on the ^f irst passage, , he is ask'ed to read the . 

. . ' ' ' 

next more difficult passage, and so on untij, he/reaches a level which, is too^ 

• , .1 ' * . 

. ■ • > 

difficult for him. ' ■ . - . v 

« ' * > . " 

The interactive procedure described above«?enhits a ^rapicj 'evaluation 
of the student *s level of competence, and the degree/to which performance 
changes with the difficulty of the text, ffter the reading .test is completed^ 
the tester then pred^ts the student with a passkg^^ one difficulty level 
above his limit, and asks thfe student* tp ^follow along' as the passage is re^ 
to Kim. Compre^Kehsion quesdconfiT 'are asked in thfe usual fashion, and succes- 



Calfee/Juel 6/ /77 *■ ' ^2. ' *>► « . 

Reading Assessment/Decision-Ma^trsV,^ ■ ' . ' 21 



sivelymore difficult passages are present e^d until he fails to answer most 
of the questions correctly. The-ltmits of listening comprehension are 

* . ^ • * 

' thSfeby established; wfiich measures the contrast between reading and listen- 

ing comprehensiorf'. ' -sr ? ^ . . * * ' 

* * 

Examiriation of the*questions in ^gure 7 reveals a structure that 

includes- variation in type of question, and productive versus' recognition 

response demands. * For questions 1' to '4/ t^e sequence ranges from specific 
* ** • 

details through a summa'rization. The fifth question in each series places 

a different demand on the student— he must answer a question that is not 

answered -by the passage, using knowledge^ that is assumed by the writer to 
* 

be, part of the reader ' s experience, and. that is important for full under- • 

standing sf the passage. On the one hand, it is reasonable to ask that most 

) ' ■ " ^ 

comprehensioh questions should be passage-dependent- (i.e., should be based 

on information contained within the passage). But it is equally true that 
viffcually Anything a person reads makes 'sense only as external knowledge is 
brought to.beaf for interpretation (Bower, 1976). In IRAS, a sample of 
such knowledge is tested explicitly. ^ 

On the surface, IRAS resembles informal reading inventories and tests 
like the Gray Oral Reading Test (Gray, ;L967) . Indeed, portions of IRAS 
are modeled after procedures used in the Gray Oral. However, the design, 
o^ the system permits mea'surement of contrastive difference scores, which ■ • 
can reflect relative strengths and weaknesses. Moreover, the' incorporation 
of probes and explicit decision straT?egies serve to formalize the clinical* 
features of the test. The tester nat~ only is able to follow his nose, 'tfhe li- 
test actually points the way. For Instance, the tester is able to tell how 
the student performs when he is given a hint. For the ^tudent^whose problem 
is lack Qf confidence rather th^n knowledge, it ^is 'important for the J:ester 



Calfea/Juel ^1 ^ 111 f ' ' ' * ^ 

Reading Assessipenf^a^^eciston-Makers , ^ • . ^22 



1 



to learn that he can do very well on a comprehension .task -when he is prodded 

but fail^. when^left on his own. This contrast in. performance suggests tha^t 

his problem has little, to, Jo with comprehension per ^e. * ^ 

Summing Up and Some Recommendations ' . * 

The theme 'in^what we'have proposed above is that a test ought tb'pro- 

vide useful infofmat ipn about geyrable fe^tur^s of the-collectibn of skills. 

knows as re^in^. The level of deCai^ in th^ breakdown performance" 

skills .should depend on tlie decision-maker's nefed for information.'- To be* 

useful, the .itiforraation needs, to have structure and "organization. The 

\J •■ • - " " . ' . * # 

district superintendent- i.^' not helped by being told the average' percentage 

oF corfect responses for each of 3^ behavdoral'objectives for each school in 
the district. For that .matter , neither is the' teacher likely. to be helped 
by knowing the same info'ripation about ea^ih' student ia the class. To be 
told that the competence In decoding and Vocabulary skills is relatively 
higher than competence in -literal comprehension skills provides a more 
reasonable basis for action. • 

This theme may n>t se^em to suggest much change from present: proceduresT" 
After all,^ raost standardized achievement 'tests provide a breakdown into ^ 
subtests, do.-^hey not? Therre^ are two differences .between what is being 
suggested in this^paper and existing practices. First, present tests, 
shackled by the restrictions to group admiriistration and multipld^choice 
format, do not provide adequate ^coverdfee of all the relevant areas of com- 
petence—for instance, we currently know little'about what students know 
about^ decoding,, because such informatioA requires that the student. read 

words or text aloud! Second," we haye 'emphasized the influence of the test- 

t 

ing context on performance, and tfhe related matter of measuring corollary 
aspects" of performance. If a student succeeds in performing task under ' " 



■ '24 .u 



Calfee/Juel 6/ /77 . " 

Reading Assessment/Decision-'Makers * • , 23 

one condition but fails unfler another, then basic knowledge is assured- 
it is the student's ability to apply tnat knowfedge that is in question. 
•Some Students do poofly on a. group test be^au^e of distraction, and lack of 
motivation. Tested^ individually, with care^to assur^that they understand > 
what is required, and with the attention and. interest of another human being 
to motivate them,.^the same students may do quite well. From the point of 
view of somfeone who has to decide what action to take to^ help the student % 
►^improve his perfprmance, this latter information would seem quite -important' 
"Even in_a closely monitored individual testing situation, the way that a 
student behaves may, be an important pirece of ipformation. The- student 
who is obviously concentrating^ who tries alternatives when he suspects ^ 
he is wrong, vhose posture and wrinkled brow show dedication to the task — ^ 
.and who still -"fails to perform well — requires different treatment from the 
student who^ fails and who also exhibits obvious lack of attention, hvper- 
active movement, or disintejest. ' - , ^ ' . ' 

There- is /)nly ^ moidest amount of research directly based on the ideas 
in this paper, and so rQcommendations for action should b,e received with 
caution. Howevet, ^ based on our knowledge and experience in reading 
research, we feel relatively confident ±n presenting three comirete recom- 
mehdations that depart substa|'tially from present practices: * ' 
—Do less massive, "bro^d-band" testing, but improve the. quality 

(the reliability and "In forma tiven ess) of what testing is done 
'^enezky, 1974)'. ' Don't do away with all testing, for that weakens / 
' accountability. • . 

-t-Look to instruction as the model for what to test, and then consider 
the l^ifluence of the testing situation, the tester, and the mate- 
rials. For the teacher, the l^est question is 'often, "Are therl^apy 



Calfee/Juel 6/77 ^ ' ^ . ■ 

Reading Assessment /Decision-Makers > " ' ' 24 

. ■ • ■ ^ ■' . ' ■ 

conditions under which the studejitVc^n succeed at this task?" 

' • • • ■ ' • ^' ■ ^ 

Reading teachers teach ^several differeirt things, and what is taught 

\f±ll vary from one levdl to another^ Assessment systems should be 

designed to reflect these fdifferences, and the. emphasis should be 

on the reliability of the patterns of ^ these differences. 

—Information must.be organized if it is to be useful. Too-cfften^' 

♦ ^educational deais ion-makers haVe the option of too littLer; informa-^ 

^> ' . ' '^^ ^ . " ' * ' ' ' 

tion (a single teat score) or too much information (a myriad oT . ''H/ . 

- * I. - ' ^ . \ 

.behavioral objective* scores) , Theory provides a useful tool for 

1 ^ ■ ■■■ '■ 

J organizing knowledge. In reading we know enough about the phenom- ' 
enon to build models of the process that are of practical value in * 

y creating tests tnd Interpreting tert data. 

- * ' *^ ► ' ^ - ' 

T^hese recommendation^ build on the assumption that, if viewed properly 

ttfe acquisition Qf reading follows a small number^ of fairJLy simple "themes, 

and that assessment reflecting these themes, in ^ straightforward fashion 

can se^ve directly fcxr decision-making. Research on riding abounds. Much 

of it portrays reading as a complex df interactive skills, i^'tJsyncratic 

• , • ■ . * ^ ' ^ • 

to the individual student-t'eache?-school' combination. ' Such a description 



may be partly true~it^ certainly lends itself well to tlie^eatioh <5f intri 
cate flow charts an4 complex computer programs*. Hit we think, that readine 
is perhaps not no intricate after all. Teaching a child tQ^read* is sonie- 
times a demanding^j^k, ^but manx^teachers succeed at this^ task year a^ter 
year — sifccess Un this endeavor is certainly more common than success in 
teaching a computer to read# o. 

The presumption of /^complexity" .goes against the canon of.^parsimony, 
but more trqUbliiig, it.letves *us unable -tb take actiofi--experfence is a 
poor guide^ when every situation is uniqlie/ The concept of independent 



Calfee/Juel 6/77 ' • - • • \ 

Reading Assessment /Decision-Makers * * 2S \ 

processes is simple and practical, and readily serves as a basis for action. 
Research is paying off. There have been some fal^' leads, and progress has 
seemed slow at frimes. But we believe that the next ten years will see, some 
significant breakthi^ghs in the assessment of reading~we are seeing some 
useful results already (e.g., McDonald & Elias, 1975). We are npt about ' 
to solve all of our problems— curriculum development and teacher training * 
will not be immediately influenced by improved assessment" techniques. *But 
the availability of a richer information, base from the differential assess- 
ment of reading skills will leave decision-makers atr all levels In'a better 
position to find out where they need to take action. 



Calfle/Juel' 6/77 

Reading Assessment /Decision-Makers 



Footnotes 



- ; 



Preparation of this paper was 'sip ported in part by a grant from the 
Carnegie Foundation. We are grateful to PriscillS Drum, Dorothy Piontkowski, 
Barbara Tanner, Kay Thosesen, and Barbara Tingey for their assistance 

" 2 . • . • ' 

The justification for some of the .simplif icat:((On may be questionable, 

*■ 

to be sure— for instance, when .one looks at the distribut^ion of scores on*- 
the basic literacy index, it is not xlear why making twelve correct resi^ns|ss 
to the eighteen questions should be identified as success, whereas making 
eleven or fewer correct should be failure. There are procedures for vali- 
dating sich dec-lsions (Calfee, 1977), but' these were not in foree her^. 
Mofeover, the several indices and test batteries in the Minnesota A^sessmei^t". 
may, in fact, be measuring a single underlying trait ("There were/l2 measures 
of a school :s re'ading performance level. Correlation analysis showed' these- 
^ 12 to be highly intercorrelated" pi 144^ to a degree that\he test is uni- 
dimensional, the analysis could need a ain^^' index, rather than the several 
prodded. In fa<;t,^we suspect^h^l the ^b|4lation is due ^o t'he fact that 
many of the it% 4re nol particularly clea|, and that several of the indices 
were constructed^ to use |3verlapp;ln^ items ./f-^. 
' - ^ ■ '. ^ . - 



I 



A 



28 



r 



INDEPENDENT STAGES 



-J 



INPUT - 




GESTALT' 



1 / 



t-^^OUTPUT 



Figure>l. Independent-stage model- of thought and /contrasting Gestalt 

Mode?.. Sequence of specify liable components opeidate cjn* infor- 
mation according Uo .inc|^>et^d[ent-stage t^hlle intricate 
interactions ate jpci$tuf^t^&i)!!kn Gestal£ model. After Cglfee 



i 



and^Floyd, I972\rV-:%^^ 



- %J 



29 




/. 



( 



WORDx 



^ ' Testing* 
Condition 



2\ 



QCiiet Room 



vs 



Noisy Rpom 



tetter-Sound 
Correspondence 



\ 



' Regular 
vs 

Irregular 



Familiarity 



Hi* Frequency 
, . vs\ ' 
Lo Frequency 



y ERLC 




Figure 2» Information processing model for word knowledge. task 



T 



oh] 



this exercise/we want to see how^well you can recognize a prefix or a suffix 

a word. F(yjr each part read, the key word and decide whether it has only a prefix 





Then fill in the^ oval next to your, choice. 


• 


Example 1 


i 

Example 2 " 




The word run^fias-^^^. , A 


^he word ^J^act has 




O -only a prefix ^ 
O only a suffix 

O both a prefix and a suffix * 
• neit'lier a prefix nor a suffix 


O only a prefix ^ 
*0 only a suffix ^ 
O both a prefix and'a suffix 
O neithe.r a prefix' nor a suffix 


i 


O I don't know " i 
s • 


t I don ' t know 




(s 





\ 



A. The word careless has 

O only a prefix 

O only a suffix 

O both a prefix and a suffix 

O neither a prefix nor a suffix 

O I don't know 

B. The word disagreeable has 

^ O only, a pr.ef^ix - ♦ . 

O only a suffix " 

O both a prefix and a suffix' 

^ O neither a"^ prefix nor a suffik 

O I don't know 

C. The word discolor has 

^ O only a prefix .. 

O only a suffix ' . 

O both a prefix and a suffix 

O neither a prefix nor a suffix 

T 

O I' don' t know 

D. The word impossible has* 

O only a ' prefix , 

O only a suf f±x ' 

" O both' a prefix and a suffdx 

Q neither a prefix nor a suffix 
> v» , 

O I don't know . 



E\ The word mister has 

r~ 

1 

O onl/ a prefix * 

O only a suffix 

O both a prefix and a suffix 

neither a prefix nor a suffix 
I- ^ 

O I don't know 

E. The word preheat has 

O only a prefix 

O only a suffix 

O both a prefix and a suffix 

Xy nei ther a prefix nor a suffix 



w I don't know 

G. Tire word reddish has 

> ft ■ -\ 
O only a prefix ^ 
O only a suffix 
O both a prefix and a suffix 
O neither a prefix nor a suffix 

O I don ' t ,know v 

H. Theyi/ord union has 
O only a prefix * 

O only a suffix ^ v 

O both a prefix and a suffix , 
O neither a prefix nor a "suffix 

O I don ' t know 



ERIC 



Figure 4. Exercise testing prefixes and suffixes. Data df Minnesota 
Educational Assessment Program s (MEAP) 1,974. 



.32 



f 

\ 



In this fexercise we want to see how well you can use ±he*clues given in a passage 
to select the best word to complete a sentence. In each part you ai^e to read the 
pass*5ge and the Tour choices which follow it. Then decide which word best com\ 
""^l^ts the sentence with the, blank space and fill inl^the oval next \to your choice. 



Example 1 

The sun had set and now everything was 

O light 
m dark 
O wet; 
O warmer 

Q I don't know 



outside. 



r 



Example 2 

They watched the dog 

O scratch 
O cat 
O f.eed 
O Ret 

O I don * t know ^ 



his fleas. 



jk. The- zebra, unlike 1^ horse, is 



all over 



O striped 
O black 
O hairy 

0 nervous 

0 1 don' t know 



"^After Billy tore his pants, he carried them to hi^ mother and she 



them up. 

O cut 
O folded 
O pushed 

sewed ^ 

O I don' t know 



4 



^ Figure 5. Exercise designed 'to tap vocabulary knowledge. Data of Minnesota 
, Educational Assessriient Program JiHEAP) 1974. 



1^ 



/ 



No. 



Schools 



Means 



Var4.able 


Level 


' Sample 3 A; " ' 




- 


SPSESGE3 


' <20% 
20-40 
>40 


68' 
1 06 


59.4 
66.8 




SREDMAT 


<50% 
50t75 
>75 

* 


■ 44 
129 
53 


.66.6 
63.6 
57.3 ' ' 


* 




<20X 
' 30-60 
>60- 


42 
14 2-. 
42 


58.4 
63. 4.-^ 
. 64.6 \ ' 




v. 




i 






School 


Measures 


' Average Percent 

^ Correct Answers^ 
• * 

on Reading Test 


No.pf 
. Schools 




Socio- Percent s^dents in schqj 

economic from high SES homes 
status less *th/aTt\20% 

/20-40% 
more than 40% ^ ^ 
' \ < 

Percent, .students in school 
;with limited readihg mate- 
rials inXhome: J 
less than 50% / 

50-75% 
more thafa 75% 

* • *• 

Perceat JStudetits using 
schob*^ library^ at least 
once ^ week: . \ 

less than 30% 
30-60%V 

more th^n 60% \ 



59 
63 
67 



67, 
64 
57 



58 
63 
65 • 



(68) 
(106) 
(52) 



(44) 
(129) 
(53) 



(42) 
(142) 
(42) 



Figure 6. Illustration of repore^-ng fofmat from MEAP (Chapter^8) and 
eas'ier-to-read format. \ ., , ~ - . 

Note: In the origi-nal table, thi;ee samples are .presented', 
but without any cUear indication how they differ. 
, >^ We present the dati for ISample 3 only. 



34 



Dr. Albei^'Elnsteln* s fteli-hbor was worried. Every day her small daugJiter went to <iall 
on the great scientist. At last the* mother went to Einstein. She told him she was/s<|)rry 
if the girl^^as keeping him from his work. ^ 

•'Oh, not JaO-a^," Einstein^told her. "I like her to come X<^ see me. We get along 
quite w^ll."^ 

•'But what could you and an eight-yeat-old girl have in common?" asked the mother, 
^ •'A ^reat deal," said the scientist. "I loye the jelly beans ^e brings rae. ' And^he 
loves the way I do her arithmetic lesson." 



1. How old was the ^glrl^ in the story? 

6, 8, or 9 years old 

2. What two thin&s did'' the girl bring to Jlinstein each day? 



Her violin and a letter from l^er mother; gum drops and her music book; her 



arithmetic lesson and' jelly beans 
3. ^What do 70U believe Einstein thouglv about the lessons? 
Were they new, easy, or strange for him? 




r 

We talk about a dog being man' s'^ best friend, but as often as not it's ry^ally the other 
• / * / 

way arouitji. My Great Dane, Max/ fat b^Tample , seems to thing of me as hik^ peb.. 

"1 ' / 

VTo begin with, he is biggc/.than I am. Max stands seven and a hal^f /eet on his h/nd 

sj^egsand .vj^ighs 280 pounds. There is 'something about a d*g as big as/a Shetland pony** 

thafc Keeps you from ordering^him around quite as you would, say a p6odle. But Ma^ haa 

gotten the idea tjtiat he was really meant to be a lap dog. He wi^ come when \ am a£^,ep — 

and lie across m^'legs, which makes it -quite* impossible for m^l^o move until he want me 

to. ^nd if he decides to sleep in a spot where I will stumble over him constantly — well, 

V I ' 

there Is no moving him, of course. 



1. How much does Max weigh? * ~ 
100,^ 200, or 300 pounds? ^ 

2. In this story Max** s owner 'compares his /ize to another animal- Name the nnimal. 

» / . - • *o - 

A teddy bear, a poodle, or a Shetland/ pony? t 

3. How does the author feel^ about the/saving "a^dog is ugjan's best friend*'?^ > 



He agree^iith it; he thinks th^opposite is true; he thinks it^ isn't true for Max 



Figure 7. Examples of materials, from IRAS^or testing comprehension of narratjlve passages 

\ • / * 

(portions of passaftte omitted) . After Calfee & Calfee, 1977. , 



